Tue, 09/14/2021 - 1:52pm Nicola Wilson

by Annika Heijna

One of the worst feelings as a student is when you realize that no one apart from your professor is ever going to read that meticulously researched, spend-hours-making-the-bibliography-just-right paper you’re so proud of. All those hours of work we are putting into our assignments are basically lost once it is determined that the quality is sufficient for a passing grade – the submitted document disappears, and no one will ever look at it again. 

Anyway, imagine my excitement when I started the MA Book and Digital Media Studies (BDMS) at Leiden University this year and Dr. Verhaar told our Digital Media Technology class that we would be contributing to a real project while developing our skills in XML and XSLT. In previous years, this project has always been the Booktrade Correspondence Project, initiated by Prof. Adriaan van der Weel, who is – not altogether coincidentally – also the founder of the BDMS MA (the BCP website is currently unavailable, but the project is still very much alive. A new homepage is expected to be up and running sometime next year). The project examines the internationalization of the Dutch booktrade in the nineteenth century on the basis of the corporate archives of two major publishers; A.W. Sijthoff and De Erven F. Bohn. One of the aims of this project is to provide machine-readable transcriptions of the letters found in these archives – and that is where the students come in. Every student is presented with the digital scan of one or two letters, and they are then asked to not only provide a transcription of the letter, but also encode it according to the TEI P5 Guidelines. In this way, a digital repository has been slowly built over the years, which itself functions as a secondary object of study for the BCP: they are also trying to critically examine the potential of such a digital correspondence repository for book historical research purposes. This symbiotic relationship between the BCP and the BDMS students could not last forever, alas: unsurprisingly, most of the letters in the archives are in Dutch, while Book and Digital Media Studies is an English-taught program and as such attracts many international students who do not speak Dutch. So at the start of this academic year, virtually all letters in the English language in the archives had already been transcribed by previous classes. 

To his credit, however, Dr. Verhaar found us new materials to transcribe, setting up a new collaboration with the researchers from MAPP.  Although all of the letters in the MAPP archive are accompanied by a short description of their scope and content, none  are currently provided with a full transcription, making it impossible to accommodate, for example, full-text searches. Our collaboration means we can produce reliable and encoded transcriptions which can help increase the digital accessibility of the letters held in the MAPP database.

Transcribing the letters 

And that is how we found ourselves transcribing letters related to the publication process of Vita Sackville-West’s novel The Edwardians. Being stereotypically digiphobic humanities students, most of us had never even heard of XML or TEI before – let alone had any experience with it. Fortunately, it turned out to be far less intimidating than it initially looked. In order to make the letters searchable and optimally useful for different kinds of research, we not only transcribed the text of the letter, but also tagged the different textual elements, identified persons and organizations mentioned in the letters and linked them to their entries in WikiData, and provided editorial annotations. 

Although all the tagging and linking and Googling - I now keep getting advertisements for trips to Iran because I was trying to find the titles of ‘the two books on Persia’ - are a lot of extra work compared to just providing plain text transcriptions, the resulting product is much more user-friendly. For example, this letter will now turn up in searches for letters mentioning ‘Hogarth Press’ or ‘Passenger to Teheran’, even though these words are not literally found in the text. These kind of improvements may seem small, but they can be essential for exactly the kind of research into publishing history MAPP is encouraging; you might want to know the earliest letter in which a specific work is mentioned, for example, even if it did not yet have the title it would end up with. This would be virtually impossible with a plain text search since you don’t know the exact phrase used to refer to the work, but it is only one click away if you know that all references to a title have been tagged as such. Of course, for a human, this encoded version of the text doesn’t look very nice. The beauty of XML, however, is that it totally separates text and form: someone else can decide later how the text should be presented without ever having to change the text encoded in the XML file. 

Beyond transcriptions

We finished the transcription and encoding of the letters somewhere around the fifth week of the semester, but our engagement with this particular body of correspondence didn’t stop there. In the remainder of the Digital Media Technology course, we used the small repository of transcriptions we’d created as a case study while practicing with relational databases and social network analysis. It was amazing to see how much information and insight could be gained from those simple tags we added to our letters. The visualization below, for example, represents the relationships between the senders and recipients of the letters – the bigger the bubble, the more letters they sent or received. Although we only had access to a very limited corpus, it is immediately evident how useful such a visualization would be for researchers interested in, for example, the authors with which the Hogarth Press held a frequent correspondence. Such a network could also be made for exploring other kinds of relationships; who mentions whom how often in their letters, or which cities did letters frequently travel between? By adding geographical coordinates to the known locations of sender and receiver, the final version will look something like this. A network graph showing the relations between senders and recipients of the correspondence can even be plotted on a map with tools like Gephi or Palladio. For someone used to their academic work being hardly more than a means to an end, it’s a nice feeling to realize that my work on this project might actually be helpful to some future researcher. In fact, I enjoyed the transcription process so much that I am now considering doing a similar kind of project for my Master’s thesis. I can only hope for next year’s students that this collaboration will continue for a long time – with around twenty students a year, it’ll take a while before they run out of letters to transcribe.

 

 

University of Leiden logo