Skip to content
/ roman18 Public
forked from MiMoText/roman18

Collection de romans français du dix-huitième siècle (1750-1800) / Collection of Eighteenth-Century French Novels (1750-1800)

Notifications You must be signed in to change notification settings

JoKons/roman18

 
 

Repository files navigation

DOI

roman18

Collection de romans français du dix-huitième siècle (1750-1800) / Collection of Eighteenth-Century French Novels (1750-1800)

Introduction

This collection of Eighteenth-Century French Novels contains digital texts of novels created or first published between 1751 and 1800. The collection is created in the context of Mining and Modeling Text, a project which is located at the Trier Center for Digital Humanities (TCDH) at Trier University. Work on the collection is ongoing.

Corpus building

In the first step, about 40 novels have been carefully created by double keying. Using this first group of novels, an OCR-model has been trained in cooperation with Christian Reul (University of Würzburg), who is one of the developers of OCR4all. The result is an OCR model for French prints of the late 18th century. This model will shortly be available within OCR4all.

Applying this OCR-model to additional scans provided by for instance Gallica (bnf.fr) and HathiTrust, a second group of novels which are not yet digitally available (or only in low quality) is now being produced.

A third group of texts, based on existing full texts (from Gallica, Google books or Wikisource) will hopefully help us reach about 200 novels by the end of 2020.

At the moment, corpus composition depends primarily on pragmatic criteria. We currently collect and plan to provide metadata for the creation of more principled subcorpora. A bibliography documenting the overall production of novels in the period is Angus Martin, Vivienne G. Mylne and Richard Frautschi, Bibliographie du genre romanesque français 1751-1800, 1977. Our goal is to use this metadata to balance our corpus of texts.

Formats

The texts are provided in several different formats. For the texts from the first group, the original double keying files are available. In addition, a cleaned-up XML version closely reflecting the original documents’ layout is available (folder XML4OCR).

The master format for all texts is an XML format following the Guidelines of the Text Encoding Initiative (folder XML-TEI). The files are encoded in accordance with a relatively restrictive schema developed in the COST Action ‘Distant Reading for European Literary History’. In addition, we provide plain text versions of the texts. However, these are best generated depending on individual needs using the script “get_text.py” (in the Scripts folder).

Licence

All texts are in the public domain and can be reused without restrictions. We don’t claim any copyright or other rights on the transcription, markup or metadata. If you use our texts, for example in research or teaching, please reference this collection using the citation suggestion below.

Citation suggestion

Collection de romans français du dix-huitième siècle (1750-1800) / Eighteenth-Century French Novels (1750-1800), edited by Julia Röttgermann, Julia Dudar and Christof Schöch. Trier: Trier University, 2020. URL: https://github.com/mimotext/roman18.

About

Collection de romans français du dix-huitième siècle (1750-1800) / Collection of Eighteenth-Century French Novels (1750-1800)

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • HTML 72.3%
  • Jupyter Notebook 26.6%
  • Python 1.1%