Linking digitisation efforts of natural sciences and humanities

Special symposium at the Digital Humanities in the Nordic Countries 2018, 7 March 2018, 9:00–12:30

Openly available for public, no registration fee.

Venue: The venue is University of Helsinki, Porthania-building, Yliopistonkatu 3, Helsinki, Finland (Google Maps). The symposium will be held at the lecture hall Porthania IV.

Organisers: Hannu Saarenmaa, Leif Schulman (Finnish Museum of Natural History, University of Helsinki), Ana Casino (The Consortium of European Taxonomic Facilities)

Call for presentations and participation is open until 20 February 2018. Contact


Digitalisation, and digitisation, are taking major steps forward in natural sciences. The Global Biodiversity Information Facility, operational since 2001, will soon break one billion records on its portal ( In Europe, based on the joint efforts of the collections-based research institutions forming the community of CETAF (, the Distributed System of Scientific Collections (, a new ESFRI proposal, will contribute by digitising and openly sharing the valuable data of additional 1 billion specimen data held by natural history museums.  ICEDIG, Innovation and Consolidation for large scale DIGitisation of natural heritage, contributes to the design process of DiSSCo. Coordinated by the Finnish Museum of Natural History, of the University of Helsinki, ICEDIG will hold its opening conference in Helsinki 5–6 March 2018.  A significant task of ICEDIG will explore the synergies between digitalisation of natural sciences and humanities.  This session will open the play in that regard. Collaborations with other Research Infrastructures in the domain of arts and humanities as DARIAH ( as well as initiatives related to cultural assets as EUROPEANA ( and the Biodiversity Heritage Library-BHL ( are to be sought for benchmarking and leveraging on each other’s efforts and resources.

Besides keeping billions of physical specimens in their collections, world’s natural science collections hold vast materials of the activities of scientists.  The main bulk is detailed diaries, i.e., field notebooks of thousands of scientists that have travelled across the world in the past 250 years.  Similar to logbooks of ships, these are increasingly being digitised.  The trails of scientists in seas, jungles, and far-away cultures can be accurately traced through the biological and geological samples, which they collected on their travels. These are still held in museums, and documented in detail in the field notebooks.  In addition, grey literature and photographs that were never published are abundant in natural science museums.  These are being digitised in many projects around the world.

A point in case: The FP7 project OpenUp! contributed in 2008–2012 over a million objects from the European natural history museums for the Europeana Portal. This needs to continue!

This workshop will give an overview of the ongoing digitalisation and digitisation efforts in collections-based natural sciences.  It will portray the leading digitisation efforts across the world, which have links to humanities. It will seek for synergies by better linking efforts in natural science and humanities.



Distributed System of Scientific Colletions: Links with cultural heritage

Dimitris Koureas (Naturalis, The Netherlands), Ana Casino (CETAF)


Why digitize? Cross-returns of discipline-specific efforts

Arturo Ariño (University of Navarra, Spain)

ABSTRACT: Scientists invest huge amounts of time in doing research, and they increasingly contribute to the data commons. However, some societal issues may prevent them from contributing more--despite that it is society itself, and not only science, who stands to benefit from increasing effort. I will summarize the "data commons" concept and, through selected examples, show how digitization in the scientific realm crosses boundaries and results in discipline-independent knowledge that can be put to any use. Some societal aspects impacting the digitizing willingness among scientists, both at the individual and at the country level, will also be discussed.

Opening up literature: extracting data from and make it widely reusable

Donat Agosti (Plazi GmbH, Switzerland)

ABSTRACT: Scientific literature includes many facts. Facts – texts and figures - are not copyrighted and thus can be extracted and reused, irrespective whether the publication is closed  or open access. Extracted facts can be linked to the source article, other facts in the article or different articles. In this lecture the Plazi workflow, TreatmentBank and the Biodiversity Literature Repository / Zenodo will be explained in the context of extracting facts from the biodiversity literature.

Digitistion of the entomological field notebooks of the Finnish Museum of Natural History

Hannu Saarenmaa, Tommi Koskinen, Jan Salonen, Pauliina Wäli (University of Helsinki, Finland)

ABSTRACT: The Finnish Museum of Natural History keeps about three hundred collection notebooks that include data relating to the older parts of the insect collections of the museum. These entomological notebooks are basically catalogues containing sample numbers and collection data about insect specimens.  The oldest notebooks date back to 1860's and include collection data of several entomologists in one book. However, most of the books are personal notebooks of one collector, whether a professional entomologist or a hobbyist collector. The books that are currently being digitized cover the period from those old days to about 1960's. In addition to collection data notebooks can include quite detailed information about the geology, soil characteristics and weather conditions of the collection locality, as well as other habitat details. One can also find hand drawn maps, excursion budget calculations and different kinds of descriptions of the collection events and other happenings. Contents of the notebooks were digitised in 2008–2009 and records of over 1 million collection specimens were turned in to structured database.  Full text and the data are available through 

Elements of Robust Pipeline Design for Mass Digitization

Mark Hereld, Nicola Ferrier (The University of Chicago and Argonne National Laboratory, USA

ABSTRACT: Mass digitization of large collections face significant challenges owing to the need for speed, heterogeneity of objects in the digitization stream, and the cost of humans in the process. We discuss the design process in terms of parallelism, pipelining, fault tolerance, constraint dependencies, quality assurance, object tracking and provenance, and the roles of computational support.  This systematic analysis of the process aims to identify priorities and principles that are generally applicable to the design of automated mass digitization pipelines.