The largest digital humanities project in Britain is developing groundbreaking tools to uncover hidden stories from across whole collections of historic newspapers, books and maps.
Living With Machines has already harnessed artificial intelligence to “read” and mine information from Ordnance Survey maps and its wider findings are intended to transform our understanding of the Industrial Revolution.
The project, from the British Library and the Alan Turing Institute in collaboration with university researchers, received £9.2 million from UK Research and Innovation in 2018. Through their study of 19th-century industrialisation, the team are exploring how new technologies and data science can bring to light stories and trends that would otherwise be missed in troves of archive material.
A central aim is to provide inspiration and new tools and methods for the British Library and other libraries, archives and heritage organisations to “unlock” their digitised historical collections for researchers and the public. Mia Ridge, digital curator at the British Library, said: “It is the biggest digital humanities project that has been funded in the UK and, in terms of this setup, probably unique anywhere. The British Library has between 180 and 200 million collection items, so computational methods are the only way that we’re ever going to really grapple with the scale of our collection.”
As for the focus on industrialisation, she said: “We’ve applied this question of ‘the impact of machines’ in a meta sense to look at the 19th century, when mechanisation was a huge issue that touched people’s lives in lots of different ways. In particular, we’re trying to take on the challenge of understanding voices such as working-class voices that weren’t as represented in media.”
The team, including computer and data scientists, historians, library professionals and geographers, are working with sources including maps, census records and the British Newspaper Archive of over 56 million digitised pages from historic papers. Ridge said: “Searching at scale, even with computers, takes a really long time. So we’re looking for ways to speed that up and developing tools to manage the process. To do this, we’re moving terabytes of data between systems — some of it’s like plumbing that we need to do to get the exciting shiny things happening. We’re releasing our tools and publishing data sets as we go on our GitHub and research repositories, linked from our website. That’s where people can go to look for code they can reuse.”
She added: “At the moment there’s work on how the coming of the railways affected the experience of people in the 19th century — so where they lived and what they had access to. That’s building on traditional research and census records, but we have also taught software how to read Ordnance Survey maps so that we can understand at scale how the country was changing in this period.”
Specifically, the team trained their MapReader tool to identify and highlight “railspace”, such as tracks, stations, depots and embankments, on over 16,000 map sheets across Britain, providing a new big picture of the spatial impact of the railways. They also trained the tool to identify and highlight buildings in order to illustrate the growth of built-up areas in granular detail. In future, MapReader could be used to identify and study all kinds of features across diverse collections of maps or images.
The project has harnessed crowdsourcing, with thousands of volunteers checking whether texts identified by algorithms relate to specific topics as predicted. As one example, Ridge said: “We entered queries in our newspaper database to find stories that were possibly about accidents involving machines and asked people to look and confirm whether an accident had happened.
“That means we got a set of articles that were definitely about accidents, and could understand further the age and gender of those involved. Then we could do additional tasks to find out more about the accidents, like their locations. Obviously mines, workshops and factories, but also homes, and lots and lots of transport accidents. People adjusting to trains has become an unexpectedly huge story in our work. Not only the imposition of rail on the landscape and rail changing the speed and accessibility of transport. But also people just not understanding that trains don’t stop like a horse would.”
She added: “Sometimes the articles are about inquests and based on renditions of the evidence given, and that is one of the few few chances to hear working-class voices and dialect in newspapers.”
Ridge said simple keyword searching was too blunt an instrument. For example, an 1888 newspaper article from Blackburn mentions “death” and “machinery”, but is about a property sale rather than an accident. The volunteers’ work can help to train future search algorithms and reduce “false positives”. Nevertheless, Ridge stresses that the project involves harnessing technology to help researchers marshal bodies of evidence, rather than having computers “do” research. She said: “It’s that scaling up of expert human attention. Our abilities to read language still vastly outstrip computers’ abilities, particularly with historical language where terms were used differently.”
“We’re producing workshops and tutorials for the wider library sector and community historians as well”Mia Ridge
The news articles checked by the volunteers bring home the dangers of 19th-century working conditions. In one accident at Swainson, Birley and Co’s mill at Preston, in 1849, an employee investigating a gas leak caused a small explosion when he used a lamp to light the spot where he was digging with a pickaxe in the mill yard. This caused the lights in the mill to go off and the weavers threw their looms out of gear to prevent damage to the machinery. This, in turn, caused a 40hp engine, suddenly relieved of all work, to become “ungovernable, acquiring through the impetus given it, such a velocity, that the flywheel, the usual speed of which was about 50 revolutions per minute, performed upwards of 200 revolutions, thereby causing it to be shattered into fragments; which, in their flight, dealt death and destruction.”
Newspaper accounts of the incident detailed damage to the building and the dreadful injuries of the two workers who died. They also noted that it was “fortunate that the only injury sustained by the large engine is the breaking of its fly-wheel”.
Another project strand has looked at how types and definitions of machines changed over time. In the early 19th century, the “machines” commonly mentioned in newspapers included threshing machines, steam engines and locomotives. By the 1870s, they included “dog skinning” machines — probably a joke, according to Ridge — as well as washing machines and bicycles.
Some of the project’s early findings have been published in journals, and more papers will appear in due course. The cumulative findings will be summarised on the project website and in book form. The free Living With Machines exhibition running at Leeds City Museum until next January also draws on the research and explains how industrialisation led to developments such as football leagues and the 9-to-5 working day.
Ridge said: “One of the thing I’m most excited about with the exhibition is that we collaborated with Leeds folk musicians to set ballads from the British Library’s collections to music, and they’ve recorded these incredible songs.
“The ballads are a really important part of the story because they represent working-class people warning other working-class people about the impact of mechanisation. There’s one called The Felting Machine that warns how this one machine can take so many jobs. We’re working to get the recorded ballads online.”
Ridge said there were various ways for people interested in the project to get involved and find out more. “We’re producing workshops and tutorials for the wider library sector and cultural heritage sector — and for community historians as well. It’s not only, ‘Hey, we’ve got this cool tool that you can use to do fuzzy searching around OCR [optical character recognition] on historical records’, say, but, ‘Here’s a tutorial so you can try it yourself with a sample data set and understand what it means’. It’s really important that we support the hardcore technical stuff with ways in, showing how people might apply it in their research or teaching or how their local history society might work with the tools.”
The top image, The Dinner Hour, Wigan, by Eyre Crowe, in Manchester Art Gallery, depicts factory girls during their break in the 1870s. Photo: Alamy.